Protein Composition for Inducing an Immune Response in a Vertebrate Comprising a Plurality of Protein Variants Covering the Heterogeneity of a Single Antigen, AMA1

ABSTRACT

The present invention relates to a protein composition suitable for inducing an immune response in a vertebrate comprising 2 to 10 protein variants of a single antigen, which is the APICAL MEMBRANE ANTIGEN 1 (PfAMA1 OR AMA1) of a  Plasmodium  species. The antigen comprises a plurality of variable amino acid positions, wherein the amino acid sequences of said protein variants, in combination, represent both the frequency of occurrence of each amino acid at said variable amino acid positions and the linkage between the variable amino acid positions, and wherein said frequency of occurrence is at least between 10% to 20% such as 10%, 11%, 12%, 13%, . . . , 20% in the single antigen, and wherein at least 75% of the linkages between the variable amino acid position is presented by the combination of protein variants.

The present invention relates to a protein composition for inducing an immune response in a vertebrate comprising a plurality of protein variants covering the heterogeneity of a single antigen. The present invention further relates to the protein variants itself, nucleic acids encoding the protein variants, and expression vectors and host cells for producing the protein variants according to the present invention. The protein composition according to the present invention is especially suitable for providing an immune response against the malaria causing infectious agent Plasmodium, such as Plasmodium falciparum.

In a relatively large number of cases, an infectious agent, such as Plasmodium or influenza, escapes a previous acquired (protective) immune response by changing one or more of its immune response inducing agents. Because of this, a previous acquired (protective) immune response, for example through vaccination or a previous infection, does no longer suffice to control the establishment of the infectious agent.

In a number of cases, the (protective) immune response inducing agents, or antigens, are one or more surface proteins of the infectious agent and the changes involve amino acid substitution in these surface proteins. Such substituted proteins can be referred to as protein variants or polymorphic proteins of the immune response inducing agent or antigen.

The amino acid substitutions are in general limited to a specific subset of amino acid positions, because biological function(s) of the (protective) immune response inducing protein. For example, if the biological function of the (protective) immune response inducing protein is attachment to the outer wall of a host cell, the subset of amino acid positions which can be substituted in order to escape the immune system of the host is limited to those amino acid positions not affecting the attachment.

Further, because of the associated biological function(s), also the number of possible amino acid substitutions at a specific amino acid position is limited.

For example, substitution of an acidic amino acid, such as aspartic acid (Asp), for a polar amino acid, such as serine (Ser), could affect the function of the immune response inducing protein and could therefore be not a suitable amino acid substitution for the infectious agent to escape the immune system of the host.

Furthermore, there appears to be a linkage, or correlation, between certain specific amino acid substitutions at specific positions of the antigen. In other words, the presence of, for example, an alanine (Ala) at a certain position in the antigen can be linked or correlated with the presence of, for example, a glycine (Gly) at a nearby or remote amino acid position in the antigen.

However, although the number of possible amino acid substitutions in a (protective) immune response inducing protein is limited, the number of possible variants of this protein remains still very high.

An example of this is the immunogenic hemagglutinin (HA) surface glycoprotein of the viral infectious agent influenza. The possible substitutable amino acid positions in this glycoprotein are so high that each year the most prevalent (regional) protein variants of HA have to be selected to provide an effective influenza vaccine for only that year.

Another example is the PfAMA-1 protein of the malaria causing infectious agent Plasmodium falciparum.

Malaria is estimated to cause up to 500 million clinical cases and 2 million deaths annually. Most of the severe morbidity and mortality occurs through infection with Plasmodium falciparum in young children and pregnant women of sub-Saharan Africa.

Several potential (protective) immune response inducing agents have been identified for vaccine development, one of these being Plasmodium falciparum Apical Membrane Antigen 1 (PfAMA1 or AMA1), encoded by a single copy gene.

Evidence from rodent and non-human primate malaria models shows that antibody responses to AMA1 can reduce levels of infection and that antibodies to AMA1 inhibit asexual parasite multiplication in vitro.

In endemic areas the immune system generates anti-AMA1 antibodies in response to infection and these may correlate with protection.

AMA-1 (FIG. 1) is an 83 kDa protein comprising a large N-terminal ectodomain, a transmembrane region and a approximately 50 amino acid C-terminal cytoplasmic tail. The ectodomain contains 16 conserved cysteine residues that form eight intramolecular disulphide bonds defining a potential three domain structure. Recent crystal structures for AMA1 confirms this three domain structure, but suggests there is considerable interaction between the domains.

Antibodies to AMA1 block merozoite invasion of erythrocytes, merozoite reorientation at the erythrocyte surface, block proteolytic processing, and asexual blood stage parasites devoid of AMA1 appear not to be viable, suggesting that AMA1 provides a critical and non-redundant biological function during erythrocyte invasion.

AMA1 is also present on sporozoite stages of development suggesting vaccination with AMA1 may target more than just asexual erythrocytic development.

However, similar to hemagglutinin (HA) of influenza, AMA1 is known to be liable to amino acid substitutions providing an escape from an earlier acquired (protective) immune response.

This is exemplified by immunization studies in rabbits showing that, although antibodies obtained to PfAMA1 from one strain of malaria inhibit the growth of the homologous strain well, other strains are inhibited to a variably lesser degree. This suggests that PfAMA1 amino acid substitutions or polymorphism may diminish the efficacy of PfAMA1 based vaccines.

An rather obvious vaccine strategy against an infectious agent, such as influenza or Plasmodium, would be to include all known, or even all theoretically possible, protein variants or polymorphic forms of an antigen in one vaccine preparation in order to induce an effective (protective) immune response against all known, and even future, variants or polymorphs of the infectious agent.

However, such vaccine strategy would be rather unpractical or even impossible because of the large number of protein variants or polymorphic forms involved. Only for Plasmodium falciparum, already more than 300 different protein variants or polymorphic forms of the AMA1 protein are known. As a consequence, a vaccine preparation effective against only the known Plasmodium falciparum polymorphs would already comprise more than 300 protein variants.

Such vaccine preparations are not only difficult, laborious, and expensive to produce, even using the present days recombinant DNA technologies, their therapeutic effectiveness in inducing an immune response would also be questionable. When presented in a single vaccine preparation to the immune system, some protein variants, or parts thereof, would inherently be more immunogenic thereby inhibiting, or even preventing, the development of an immune response against less immunogenic protein variant or parts thereof.

Further, such vaccine preparation, comprising all known variants of an antigen, would probably not be effective against future strains of the infectious agent which are likely to develop because of the evolutionary pressure of such vaccine preparation.

Therefore, it is an objective of the present invention to provide a protein composition or vaccine preparation suitable for inducing a (protective) immune response in a vertebrate against an infectious agent obviating the above drawbacks.

It is a further objective of the present invention to provide a protein composition or vaccine preparation suitable for inducing a (protective) immune response in a vertebrate effective against polymorphic antigens (of infectious agents).

Still a further objective of the present invention is to provide a protein composition or vaccine preparation which can be relatively easy to produce.

Still another object of the present invention is to provide a protein composition or vaccine preparation which is relatively cheap and therefore economically feasible.

These objects and other objects and advantages of the present invention are met by a protein composition as defined in the appended claims.

Specifically, the above objects and other objects and advantages of the present invention are met by a protein composition suitable for inducing an immune response in a vertebrate comprising 2 to 10 protein variants of a single antigen which antigen comprises a plurality of variable amino acid positions, wherein the amino acid sequences of said protein variants, in combination, represent both the frequency of occurrence of each amino acid at said variable amino acid positions and the linkage between the variable amino acid positions, and wherein said frequency of occurrence is at least 10 to 20% in the single antigen, and wherein at least 75% of the linkages between the variable amino acid positions is presented by the combination of protein variants.

Using the above defined protein variants, the present inventors have surprisingly discovered that a single vaccine preparation can be provided effective against known and, possibly, future (highly) polymorphic infectious agents.

Because of the relatively small number, i.e., 2 to 10, of protein variants, such protein composition or vaccine preparation will be easier to produce and more costs effective compared to a protein composition or vaccine preparation comprising all known naturally occurring protein variants of a polymorphic infectious agent.

Further, the relatively small number, i.e., 2 to 10, protein variants will reduce, or even eliminate, any immunogenic dominance of certain protein variants or parts thereof.

According to the present invention, the antigen comprises a plurality of variable amino acid positions. Such antigen can be any immunogenic protein of an infectious agent such as hemagglutinin of influenza or AMA1 of Plasmodium, as long as it comprises a number of amino acid positions which are found in nature to be variable or substituted.

Although the present invention is not particularly limited to a specific number of variable or substituted amino acid position, it logically follows that the present invention will be preferably used when the number of variable amino acid substitutions increases. This because the number of possible protein variants will also proportionally increase.

The present invention is preferably used when the number of variable amino acid substitutions is at least 10, more preferably at least 20, even more preferably at least 30, and most preferably 40 or more.

The presence and the number of variable amino acid positions in a given antigen can routinely be determined by the skilled person using standard techniques.

For example, all known naturally occurring variants of an antigen can be aligned using the maximum homology between the sequences. An amino acid position in the antigen is designated a variable position if at the corresponding positions in the known antigens more than 1 species of amino acid can be present.

For example, an alignment of 50 known species of an immunogenic protein of 500 amino acids of an infectious agent can reveal that, starting numbering from the N-terminus, amino acid positions 4, 60, 45, 57, 256, 313, 345, 456, 457, 458, 478, 498, and 497 can contain different amino acids or, in other words, the amino acid at this position can, in nature, be substituted by another amino acid.

The “frequency of occurrence” as used herein is defined as the percentage of occurrence of a species of amino acid in the naturally occurring variants. In other words, if amino acid position 45 is in 20% of the naturally occurring variants an Alanine (Ala), in 45% a Glycine (Gly), and in 35% a Valine (Val), then the frequency of occurrence for position 45 is 20% Ala, 45% Gly, and 35% Val.

The “linkage between the variable amino acid positions”, as used herein, is defined is the statistical occurrence, i.e., p<0.05, in a natural variant of a single amino acid or stretch of amino acids at (a) variable position (s) in combination with a single amino acid or stretch of amino acids at (an)other variable position(s). For example, the occurrence of the amino acid sequence “TEND” at variable positions 45, 46, 47, and 48 is designated “linked” if its occurrence is statistically correlated with the occurrence of the amino acid valine (Val) at position 123.

The term “in combination”, as used herein is defined as the combination of all protein variants according to the present invention aligned using the maximum homology.

For example, “combined” for variable amino acid position 345 would mean that all amino acids present at this position would be taken into account. If position 345 is in 30% of the protein variants an alanine (Ala) and in 70% of the protein variants a valine (Val), then combined for this specific position would mean that the frequency of occurrence in the combined protein variants is 70% Ala and 30% Val.

With respect to the linkages, a linkage is regarded as preserved in combination if this particular linkage can be found in any of the protein variants. In other words, the linkage can be found in at least one of the protein variants.

The term “represent” as used herein is used to indicate that the frequency of occurrence of an amino acid at a specific position in nature, is reflected in the protein variants according to the present invention. This does not imply that the natural frequency of occurrence should also be found in the combination of protein variants according to the present invention.

It rather implies that if an amino acid at a certain variable position is more frequently found in nature, this more frequent occurrence should, if possible, also be reflected in the combination of protein variants according to the present invention.

For example, if in nature the amino acid valine (Val) is found in 60% of the cases, glycine (Gly) in 30% of the cases, and serine (Ser) in 10% of the cases at position 456, assuming that 6 protein variants are used for the protein composition, then Val is preferably present in 3 protein variants, Gly in 2 protein variants and Ser in 1 protein variant at this position.

In the above situation, wherein only 4 protein variants are used, then Val is preferably present in 2 protein variants, Gly in 1 protein variant and Ser in 1 protein variant at this position, reflecting the prevalence for Val at this position in the majority of cases.

The number of protein variants in a specific protein composition or vaccine preparation is dependent on the amino acid variability of the antigen or the number of known sequence variants. Such variability can dictate that at least 2, 3, 4, 5, 6, 7, 8, 9, or 10 protein variants need to be used.

According to the present invention, the relation between the number of protein variants and the variability of the antigen is determined by the variable amino acid positions in the antigen showing the largest variability, i.e., number of different amino acids.

For example, if amino acid position 258 shows the largest variability by dictating three possible amino acid substitutions at this position, then the minimal number of protein variants should be at least 3 allowing to reflect this variability in the combination of protein variants. In other words, at least 3 protein variants are needed that represent the 3 possible amino acids at this position.

If in case 60% of the cases show Val at this position and 20% of the cases show either Gly or Ser at this position, then preferably, the number of protein variants is 4 allowing 2 protein variants with Val at position 258, 1 protein variant with Cys at this position and 1 protein variant with Gly at this position.

It is surprisingly found by the present inventors that by using the frequency of occurrence and the linkage, whereby only amino acids at variable positions with a frequency of occurrence of at least 10-20%, such as at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20% are taken into account, and at least 75% such as 75, 80, 85, 90, 95, 99, 100% of the linkages found in nature are preserved in the combination of protein variants, a protein composition can be prepared which is able, using a limited number of proteins, to substantially reflect the immunogenic repertoire of a (highly) polymorphic naturally occurring protein.

According to a preferred embodiment of the present invention, the single antigen in the protein composition is the AMA1 protein of a Plasmodium species selected from the group consisting of falciparum, vivax, knowlesi, malariae and ovale, preferably Plasmodium falciparum.

As already indicated above, the number of known variants of the AMA1 protein of Plasmodium is already well above 300. In addition, the number of variable amino acid positions in this protein is well over 40.

Because the present invention provides a single protein composition capable of immunogenically covering, using a limited number of protein variants, such high variability, the protein composition according to the present invention is especially suited to be used for a vaccine based on the AMA1 protein.

In an especially preferred embodiment of the protein composition according to the present invention the variable amino acid positions 162, 167, 172, 173, 175, 187, 190, 196, 197, 200, 201, 204, 206, 207, 225, 230, 242, 243, 267, 282, 283, 285, 296, 300, 308, 332, 393, 404, 405, 407, 435, 439, 448, 451, 485, 493, 496, 503, 512, 544 of the AMA1 protein of Plasmodium falciparum in the protein composition.

It has been found by the present inventors that when these variable amino acid positions are taken into account when developing the protein composition according to the present invention, substantially all relevant immunogenic variants of this protein in naturally occurring variants of Plasmodium falciparum are covered.

The protein composition according to the present invention preferably comprises at least three protein variants selected from the group consisting of SEQ ID Nos 1 to 12, more preferably at least three protein variants selected from the group consisting of SEQ ID Nos 1 to 6 and most preferably three protein variants comprising SEQ ID No 1, SEQ ID No 2, and SEQ ID No 3.

In a particularly preferred embodiment, the protein variants of the present protein composition are linked. Such linkage can be provided either chemically or through recombinant DNA technology using standard method well known to the skilled person.

For example, separate nucleic acid sequences encoding the different protein variants can be linked by, for example PCR or ligation, using a linker nucleic acid sequence. After inserting such linked construct in a suitable expression vector, the construct encoding the linked protein variants is expressed as a single protein molecule which can be, after suitable isolation, be incorporated in the protein composition.

Because of the inventive concept of the protein composition according to the present invention, the present invention also relates to protein variants chosen from the group consisting of SEQ ID Nos 1 to 12, preferably protein variants chosen from the group consisting of SEQ ID Nos 1 to 6, and most preferably potein variant chosen from the group consisting of SEQ ID Nos 1 to 3.

Similarly, the present invention also relates to nucleic acid sequences encoding the above protein variants, preferred protein variants, and most preferred protein variants, preferably SEQ ID Nos 12 to 24, more preferably SEQ ID Nos 12 to 18, and most preferably SEQ ID Nos 12 to 14.

These nucleic acid sequences can be suitably inserted, ligated or recombined in an expression vector under the control of, and operably linked to, suitable regulation and propagation signals such as a promoter, a terminator, secretion signals, enhancer(s), origin(s) of replication, a selection market, etc.

Hence, the present invention also relates to expression vectors comprising a DNA sequence according to the present invention, preferably pPicZalpha or pPIC9, allowing expression of the protein variants in the preferred methylotropic yeast Pichia pastoris.

Such expression vector is preferably transformed or transfected, although integration into the genome can also be envisaged, in a suitable host cell allowing expressing of the protein variants according to the present invention.

Hence the present invention also relates to a host organism transformed or transfected with the above defined expression vectors. The host organism is preferably Pichia pastoris.

Because of the advantageous immunogenic properties of the protein composition according to the present invention, the invention also relates to the use of a protein composition according to the present invention for the preparation of a medicament for vaccinating a vertebrate, preferably against malaria. The vertebrate is preferably a human.

The protein composition according to the present invention can be obtained using a method for producing a protein composition, comprising:

-   -   a) determining, in an antigen, the variable amino acid         positions; the frequency of occurrence of each amino acid at         said variable amino acid positions; and the linkage between the         variable amino acid positions;     -   b) determining the variable amino acid position comprising the         maximal number X of different amino acids, wherein the frequency         of occurrence of each different amino acid is at least 10-20%;     -   c) designing at least X protein variants representing, in         combination, the frequency of occurrence of each amino acid at         each variable position.

In a preferred embodiment, the design of the at least X protein variants further represents, in combination, at least 75% of the linkage between the variable amino acid positions.

In a particularly preferred embodiment of the method according to the present invention, step (c) comprises designing protein variants Y₁ to Y_(x) by assigning to a variable amino acid position in protein variant Y₁ the most frequent occurring amino acid at this position and to Y_(x) the less frequent occurring amino acid at this position, assigning to the corresponding amino acid position in protein variants Y₂ to Y_(x-1) either the remaining amino acid(s) or the same amino acids depending on their frequency of occurrence under the proviso that the established linkage between the variable amino acid positions is preserved in at least 75% of the cases.

The present invention will be further illustrated using an example showing, in detail, a preferred embodiment thereof. In the example reference is made to the appended figures wherein:

FIGURES

FIG. 1 is a schematic representation of the AMA1 protein of Plasmodium falciparum;

FIG. 2 shows a summary of the frequency of occurrence of the amino acids of the AMA1 protein of plasmodium falciparum. Indicated are the amino acid position, the amino acids at the position, their frequencies and the number of AMA1 sequences in which the indicated amino acid was found;

FIG. 3 shows the upstream linkages between the different variable amino acid positions from variable amino acid position 162;

FIG. 4 shows the downstream linkages between the different variable amino acid positions from variable amino acid position 162;

FIG. 5 shows alignment of DiCo1, DiCo2, DiCo3 and the yeast expressed HB3-AMA1, 3D7-AMA1 and FVO-AMA1, as well as the consensus sequence that was derived from the alignment of the 356 PfAMA1 sequences obtained from the Genbank database;

FIG. 6 shows differences between Pichia-expressed sequences in domains 1 to 3. Upper right: total differences between alleles and lower left: differences according to domain;

FIG. 7 panel A shows SDS-PAGE of purified DiCo's. Lane 1, Bench mark marker; lane 2, DiCo 1; lane 3, DiCo 2; lane 4. DiCo 3; lane 5, DiCo mix. Panel B shows a Western blot with 4G2 as primary antibody. Lane 1, molecular weight marker; lane 2, DiCo 1; lane 3, DiCo 2; lane 4. DiCo 3; lane 5, DiCo mix. Samples were not reduced;

FIG. 8 shows IgG antibody levels to various AMA-1 variants. Symbols refer to single rabbits. Within each treatment group, the same symbols represent the same animals. Circles: rabbit 1, crosses: rabbit 2, triangles: rabbit 3, squares: rabbit 4 and diamonds: rabbit 5

FIG. 9 shows growth inhibiting titres versus IgG. concentration on individual rabbits according to vaccine antigen and assay strain. Symbols refer to single rabbits. Within each treatment group, the same symbols represent the same animals. Circles: rabbit 1, crosses: rabbit 2, triangles: rabbit 3, squares: rabbit 4 and diamonds: rabbit 5; and

FIG. 10 shows growth inhibiting titres versus IgG concentration groups of rabbits according to vaccine antigen and assay strain. Open circles: DiCo mix, crosses: DiCo 1, triangles: DiCo 2, squares: DiCo 3 and diamonds: FVO-AMA-1.

EXAMPLE A Diversity-Covering Approach to Immunisation with Plasmodium falciparum AMA1; Broader Allelic Recognition and Growth Inhibition Summary

Plasmodium falciparum AMA1 (PfAMA1), a candidate malaria vaccine, is polymorphic. This polymorphism is believed to be generated predominantly under immune selection pressure, and as a result may compromise attempts at vaccination.

355 PfAMA1 sequences obtained predominantly from parasites in endemic areas were aligned and analysed and it was shown that about 10% of the 622 amino acid residues have potential to vary between alleles (FIG. 2). Linkages between polymorphic residues have also been identified (FIGS. 3 and 4). From this analysis three Diversity-Covering (DiCo) PfAMA1 sequences were generated that take account of linkages and when taken together incorporate 80% of all amino acid variability. For each of the three DiCo sequences a synthetic gene was constructed and used to transform the methylothrophic yeast Pichia pastoris, allowing recombinant expression at yields between 50 and 100 mg/L. All three DiCo proteins were reactive with the reduction sensitive monoclonal antibody 4G2, indicating that the DiCo's had a similar conformation to naturally occurring PfAMA1.

Rabbits were immunized with FVO strain PfAMA1 or with the DiCo's either individually or as a mixture. Antibody titers and the ability to inhibit parasite growth in vitro were determined. Animals immunized with the DiCo mix performed similarly to animals immunized with FVO AMA1 when measured against FVO strain parasites, but outperformed animals immunized with FVO AMA1 when assessed against other strains. The levels of growth inhibition (70%) induced by the mix of three DiCo's were comparable for FVO, 3D7 and HB3, suggesting a considerable degree of variation in AMA1 is adequately covered.

This indicates that vaccines based upon the DiCo mix approach provide a broader functional immunity than immunisation with a single allele.

Materials and Methods

Cloning of Diversity Covering AMA-1 Sequences in Pichia pastoris.

Synthetic genes with optimised codon usage for Pichia pastoris were designed for AMA1 proteins comprising domains I, II and III of DiCo1, DiCo2 and DiCo3 (DNA2.0, San Diego). These sequences were PCR-amplified with primers X1 (5′ gcg aat tca ttg aaa ttg ttg aaa gat c 3′, SEQ ID No: 26) and Y1 (5′ ggg gta cca aca tct tat cgt aag ttg g 3′, SEQ ID No: 27), X1 and Y2 (5′ ggg gta ccg aca tgt tat cgt aag ttg gc 3′, SEQ ID No: 28) and X1 and Y1 for DiCo 1, 2 and 3 respectively.

The PCR products were cloned into the EcoRI-KpnI sites of the pPicZA vector (Invitrogen, Groningen) whereby the atg start codon (Met) of this vector was used for transcription initiation and used to transform Escherichia coli DH5 cells. After transformation of the Escherichia coli DH5 cells, plasmids were isolated, checked for the presence of the expected restriction sites and then used to transform Pichia pastoris KM71H following manufacturer's protocols.

Transformed Pichia pastoris colonies were tested for protein production by culture in 10 mL glycerol-containing medium a 50 ml tube for 48 hours at 29-30° C. under vigorous shaking. Cells were harvested by centrifugation (5 minutes at 2500 rpm, table top centrifuge) and resuspended in 4 ml methanol-containing medium, and then cultured for 24 hours at 29-30° C. under vigorous shaking. After low-speed centrifugation, the culture supernatant was harvested and 20 μl, was tested on SDS-gel for proteins with the expected size. The identity was confirmed by western blotting with the reduction-sensitive monoclonal antibody 4G2.

Protein Production

Fermentation runs were performed in either 3 or 7 litres fermentors (Applikon, Schiedam, The Netherlands), with initial starting volumes of 1 and 2 litres, respectively. The Pichia pastoris clones were grown in BMGY (1% yeast extract, 2% peptone, 1.34% yeast nitrogen base, 1% glycerol, 0.4 mg of biotin per litre, 0.1 M K-phosphate, pH 6.0) at 30° C. for 24 hours. 50 ml/litre was used to inoculate the fermentor containing minimal salt fermentation medium (per litre of medium: MgSO₄.7H₂O, 14.9 grams; K₂SO₄, 18.2 grams; CaCl₂, 0.65 grams; KOH, 4.13 grams, glycerol 40 grams, 26.7 ml 85% H₃PO₄ and 12 ml PTM1 trace salts solution).

During the batch phase air was sparged at 1 litre per litre initial medium volume. Temperature was kept constant throughout the fermentation at 30° C., pH was kept at 6.0, using 25% NH₄OH and 85% H₃PO₄. Fed batch was started after the dissolved oxygen levels were back to 21%, after being down to (almost) zero, generally between 18 and 24 hours.

Subsequently, the culture was fed 50% glycerol fortified with 12 mL/litre PTM1 trace salts at a rate of 32 mL/hour per litre initial medium volume for 20 to 24 hours. During this fed-batch phase of the fermentation, the medium was sparged with 100% oxygen to keep the dissolved oxygen concentration at 21%. Lastly, the culture was induced by the addition of methanol. The first 3 hours at a rate of 1 mL/hour per litre initial medium volume, increasing to 3 mL/hour per litre initial medium volume in 3 hours, continuing at this rate for approximately 18 hours.

During the induction phase oxygen levels were maintained at 21% with 100% oxygen, but air was sparged continuously into the medium as well (at 1 litre per litre initial fermentation medium), as it was observed that keeping the oxygen level at 21% with 100% oxygen only prevented expression of some proteins.

After induction the pH of medium was increased to pH 7.8 and cooled with a cryostat to 15° C. or lower. Cells were removed by centrifugation (25 minutes, 5000×g, 4° C.) and culture medium was filtered through a 0.22 μm filter using a Quixstand hollow fibre cartridge (GE Healthcare, Etten-Leur, The Netherlands) to remove all remaining yeast cells.

The protein in the culture supernatants (2 or more Litres) were concentrated to approximately 100 mL using a Quixstand with a 10 kDa cut-off hollow fibre column, and diluted with an equal volume of demineralised water and concentrated again. This procedure was repeated four times.

Thereafter, the proteins were bound to a hydroxyapatite column (REF), equilibrated with 1 mM sodium phosphate buffer pH 8.0. The protein was eluted from the column with 50 mM sodium phosphate buffer pH 8.0. This fraction was concentrated to less than 1 ml and subsequently put on a Superdex 75 preparative size exclusion chromatography column (REF). The fractions containing the DiCo protein were pooled, concentrated and sterilized by filtration trough a 0.22 μm filter.

Rabbit Immunizations.

Rabbits were housed, immunised and blood was sampled by Eurogentec SA, Seraing, Belgium according to national animal welfare regulations. Five groups of five rabbits were immunised on days 0, 28 and 56 with Pichia-expressed DiCo1 (30 μg), DiCo2 (30 μg) or DiCo3 (30 μg), FVO PfAMA-1 D123 (30 μg), or a mixture of DiCo1, DiCo2 and DiCo3 (10 μg each, in total 30 μg). Montanide ISA 51 (SEPPIC, Paris, France) was used as the adjuvant. Vaccine formulations were prepared according to the manufacturers instructions (50/50 mass/mass). Antisera obtained two weeks after the third immunisation (day 70) were tested for reactivity by ELISA, and functional capacity by the in vitro Parasite Inhibition Assay.

Elisa

Enzyme-linked immunosorbent assay (ELISA) was performed in duplicate on serum samples in 96 well flat-bottomed microtitre plates (Greiner, Alphen a/d Rijn, The Netherlands), coated with 500 ng/mL purified AMA1 antigens according to published methods (Kocken 2002).

The secondary antibody was anti-rabbit IgG conjugated to alkaline phosphatase (Pierce, Rockford, Ill.). A standard curve was applied on each plate and titres of the unknowns were calculated by a four-parameter fit. Titres are expressed as arbitrary units, where 1 AU yields an OD of 1.0. Thus the amount of AU of a sample is the reciprocal dilution at which an OD of 1 will be achieved.

IgG Purification

Antibodies to be used for parasite inhibition assays were purified on protein A columns (Sigma, St Louis, Mo.) using standard protocols ENRfu(8), exchanged into RPMI 1640 using Amicon concentrators (30 kDa cutoff), filter-sterilised and stored at −20° C. until use. IgG concentrations were determined using a Nanodrop ND-1000 spectrophotometer (Nanodrop Technologies, Wilmington, Del., USA).

Parasites

P. falciparum strains NF54, FCR3, HB3 were cultured in vitro using standard Plasmodium falciparum culture techniques in an atmosphere of 5% CO₂, 5% O₂ and 90% N₂. FCR3 AMA-1 (accession no. M34553) differs by 1 amino acid in the pro-sequence from FVO AMA-1 (accession no. AJ277646).

In Vitro Parasite Inhibition Assay (PIA)

The effect of purified IgG antibodies on parasite invasion was evaluated in triplicate using 96 well flat-bottomed plates (Greiner) with in vitro matured Plasmodium falciparum schizonts at a starting parasitemia of 0.2-0.4%, a haematocrit of 2.0% and a final volume of 100 μL containing 10% normal human serum, 20 μg mL⁻¹ gentamicin in RPMI 1640.

After 40 to 42 hours, cultures were resuspended, and 50 μL was transferred into 200 ice-cold PBS. The cultures were then centrifuged, the supernatant removed and the plates were frozen. Inhibition of parasite growth was estimated using the pLDH assay as previously described (Kennedy 2002). Parasite growth inhibition, reported as a percentage, was calculated as follows:

100−((Od _(experimental) −Od _(background))/(Od _(control) −Od _(background))×100)

Control IgG was isolated from rabbits that had been immunised with adjuvant only.

Statistical Analysis

IgG titres were compared with Analysis of Variance (ANOVA) using log-transformed IgG titres as dependent variable and vaccine antigen as independent variable. PIA titres were analysed with linear mixed effects models, to allow for the simultaneous comparison of various IgG concentrations, whilst correcting for pseudoreplication (Paterson et al).

The PIA titre was entered as the dependent variable and log-transformed total IgG and treatment group as independent variables. Various models were fitted for each strain tested and the best fitting model was selected based on log-likelihood.

Results Analysis of Polymorphisms in AMA-Variants

All Plasmodium falciparum AMA1 sequences, complete sequences and fragments, available on Jan. 3, 2005 (360 in total), were retrieved from a nucleotide search in the Pubmed database (www.ncbi.nlm.nih.gov/entrez/). Duplicates were removed (U84348 [3D7], AU087598 [FVO], U33274 [3D7] and AF061332 [KF1916]) and two partial sequences for the NF7 strain were combined (U33280 and M27957). The ensuing 355 sequences were aligned with an Excel macro and polymorphic residues were identified.

To reduce the risk of incorporating data resultant from sequencing errors, polymorphic positions were defined as those that varied between sequences such that two or more of the 355 sequences carried the same amino acid substitution at that site. Using this definition, 64 of 622 amino acid positions were designated polymorphic; 9 in the prosequence (aa 1-96; 9.4%), 33 in domain 1 (aa 97-315; 15.1%), 8 in domain 2 (aa 316-425; 7.3%), 11 in domain 3 (aa 426-545; 9.2%), none in the transmembrane region (aa 546-567; 0%) and 3 in the cytoplasmic tail (aa 568-622; 5.5%). The alignment also showed that the presence of a particular amino acid at one polymorphic position was often linked to the presence of particular amino acids down—(FIG. 4) as well as upstream (FIG. 3) (i.e. in the direction of the C-terminus or N-terminus, respectively) of the protein.

Software written to identify these linkages revealed upstream linkage for 24 residues at 22 polymorphic sites. The most N-terminal member of each downstream linkage group tended not to be the predominant residue at that position (in 53 of 55 cases), although it tended to be linked to downstream residues that were the predominant (consensus) residue. Unusually, at position 197 two possible amino acids (G and D) showed downstream linkage. As exceptions in both position 172 and 283, the predominant residue linked to downstream residues and in position 200 both the predominant and minor residues are linked.

Upstream linkage was observed for 26 (27) residues at 22 polymorphic sites. The most C-terminal member of the linkage group tended not to be the predominant residue at that position and the linked upstream residues were generally the predominant (consensus) residues. Exceptionally, at position 197 all residues showed upstream linkage with position 196. The E at position 285 showed an unusual upstream linkage, in that it is linked to the non-consensus L283 and R503 is upstream linked to the non-consensus residue M496.

Design of Artificial Genes to Cover Diversity and Incorporate Linkages.

To reduce the complexity of downstream vaccine development it was felt that a maximum of three diversity-covering (DiCo) sequences could be accommodated. These DiCo's (DiCo1, DiCo2 and DiCo3), each of which comprised domains I, II and III (aa 97-545), were designed such that, when taken together, the maximal number of naturally occurring residues was incorporated, with the proviso that residues with the lowest frequencies were linked, thereby restricting the sequences. Thus over 80% of amino acids present at any given position in naturally occurring sequences and most linkages can be found back in the DiCo combination.

To restrict it to three DiCo's, 12 positions at which less than 10% of the sequences showed variation [residues: 121 (99% E), 189 (92% L), 199 (99% R), 224 (99% M), 228 (97% N), 244 (92% D), 245 (96% K), 269 (96% K), 325 (98% H), 330 (91% S), 395 (92% K) and 505 (96% F)] and four positions at which between 10 and 16% of sequences showed variation [173 (85.7% N), 175 (89.6% D), 207 (83.7% Y) and 407 (84.4% Q] were excluded. Thus, of the 52 polymorphic positions the 36 most variable were included in the design of the DiCo proteins (22/33 positions, 4/8 and 10/11 in domains I, II and III, respectively).

Three backbones were designed, taking linkages into account. DiCo1 (SEQ ID Nos: 1 and 4) was fully compliant to the down- and upstream linkages. DiCo2 (SEQ ID Nos: 2 and 5) broke downstream linkage in that H296 is linked to N448 rather than D448 as in natural AMA1 sequences and upstream linkage in that D296 is restricted by N448 whilst H296 is included. DiCo3 (SEQ ID Nos 3 and 6) incorporated the least prevalent amino acids and two restricted downstream positions were not compliant.

Normally E206 is restricted by D197/D200/L201, while DiCo3 incorporates K206. In addition, N225 is normally restricted by L201; in DiCo3 I225 is incorporated. Two upstream linkages were not compliant: H200 and F201 are restricted by K206, but D200 and L201 are included.

Non-linked polymorphisms were subsequently incorporated respectively into DiCo1, DiCo2 and DiCo3 backbones according to their frequency of occurrence. Because AMA1 is not believed to be glycosylated in malaria parasites, NxS/NxT motifs were also changed to remove potential N-glycosylation sites (SEQ ID Nos 1 to 3). Non-polymorphic residues were changed as previously described (Kocken 2002) (T288->V, S373->D, N422->D, S423->K and N499->Q). Because N162 of DiCo 1 and DiCo2 is a linked polymorphic residue as well as a potential N-glycosylation site it was changed to Q162 to avoid introducing restrictions.

FIG. 5 shows all resultant DiCo sequences (SEQ ID Nos 1 to 3) in alignment with HB3 (SEQ ID NO: 30), 3D7 (SEQ ID No: 29) and FVO (SEQ ID NO: 31) strain AMA1 and with the consensus sequence (SEQ ID NO: 25) derived from alignment of all 355 input sequences. The differences between sequences are summarised in FIG. 6, which includes a profile of differences according to AMA1 domain. DiCo1 is close to the consensus sequence (differing at only 4 positions, all of which are in Domain I); as expected DiCo2 and DiCo3 differ markedly from the consensus sequence and overall the DiCo sequences differ quite considerably from one another.

DiCo Expression in Pichia pastoris Expression in small-scale fermentors resulted in protein levels of 40 mg/L or more before purification. DiCo1 was purified using Ni-IMAC chromatography. Although a hexa-His tag was also incorporated into DiCo2 and DiCo3 these did not bind to Ni-IMAC and were instead purified by sequential hydroxyapatite and size exclusion chromatography.

Purified DiCo proteins were assessed for integrity and purity by SDS-PAGE and for antigenicity by Western blot using the reduction sensitive monoclonal antibody 4G2 (FIG. 7). The main band for all three DiCo proteins migrated with the expected size (50 kDa) and reacted with 4G2. A proportion of DiCo1 appears in a more slowly migrating band; as has been previously reported (REF) some AMA1 molecules expressed in Pichia may undergo O-glycosylation.

Gel analysis by a stain specific for glycosylation moieties (data not shown) suggests that this may be occurring for DiCo1, an observation that may explain the observed heterogeneity in migration.

Western blot analysis also revealed dimerisation for all three DiCo proteins, this has previously been observed with the full-length ectodomain (aa 25-545) GMP product, where it did not result in loss of potency.

Immunological and Functional Assessment

Groups of 5 rabbits were immunized three times with 30 μg of Pichia expressed and purified AMA1 from either FVO strain, DiCo1, DiCo2 or DiCo3 or with a mix of 10 μg of each of the three DiCo's (DiCoMix).

Antibody levels to the immunising antigen and to AMA1 from three laboratory strains were determined by ELISA (FIGS. 8 and 9). Although none of these DiCo sequences have been observed in nature, overall they were all well recognised by antibodies from rabbits that had been immunised with FVO AMA1 and elicited antibodies that were reactive with the FVO, 3D7 and HB3 antigens, further suggestive that they attained appropriate conformation.

Rabbits immunised with DiCoMix showed responses that overall were comparable to those obtained by immunisation with single components. Moreover, for all antigens under investigation, the variation in IgG titres was smallest in the group immunised with the DiCoMix. The responses to the FVO, 3D7 and HB3 antigens did not differ significantly between the treatment groups (p=0.27, 0.48 and 0.35, respectively).

For DiCo1 there was a tendency for the FVO immunised animals to have lower titres than the DiCo1 and DiCoMix groups (p=0.09). For DiCo2, animals immunised with DiCo2 tended to have higher titres than those immunised with DiCoMix and DiCo1 and DiCo3 immunised animals had lower titres than animals immunised with DiCo2 (p=0.0035). The animals immunised with DiCo3 had significantly higher DiCo3 titres than the animals immunised with DiCo1, DiCo2 and DiCoMix (p=0.002).

To assess anti-parasitic effects, IgG purified from rabbit sera obtained two weeks after the final immunisation was added to synchronised asexual blood stage cultures of 3D7, FCR3 and HB3 strains (FIG. 10). The DiCoMix was always amongst the three best performing antigens with about 70% inhibition at 6 mg/mL for all three strains tested in the inhibition assay. Moreover, the DiCoMix performed nearly as well as the homologous (FVO) antigen for FCR3. All separate DiCo antigens elicited antibodies in rabbits that were active in inhibiting parasite growth. The DiCo3 antigen was not very effective for FCR-3, but it induced good functional responses to 3D7 and HB3, despite the difference of 17 or 18 amino acids for 3D7 and HB3, respectively. These observations again suggest that all antigens attained an appropriate conformation.

For the FCR3 strain the highest levels of growth inhibition were observed for the DiCo2 group, followed by FVO, DiCoMix, DiCo1 and DiCo3, respectively. For each doubling in total IgG levels, inhibition levels increased by 16% (95% CI: 13% to 19%) for all groups except DiCo3, where the increase of inhibition per IgG doubling was significantly lower [−6.3 (95% CI: −10.5 to −2.0)]. The level of inhibition in the DiCoMix group at 1 mg/mL IgG was estimated at 31% (95% CI: 18 to 45%). There was a tendency for an increased level of inhibition at 1 mg/mL for the DiCo2 group (16% higher 95% CI: −4 to 35%), whereas the level of inhibition at 1 mg/mL IgG was significantly lower in the DiCo3 group (−21%, 95% CI −41 to −2).

For the HB3 strain the highest levels of growth inhibition were observed for the DiCoMix group, followed by DiCo3, DiCo1, FVO and DiCo2, respectively. Growth inhibition levels increased by 12% (95% CI: 11 to 14) for each doubling of total IgG. There were no significant between group differences for the effect of IgG on inhibition. The level of inhibition at 1 mg/mL total IgG was estimated at 38% (95% CI: 27 to 49%). There was a tendency for the DiCo2 and FVO groups to have lower levels of inhibition at 1 mg/mL [−13 (95% CI: −29 to 3] and −14 (95% CI: −30 to 2), respectively].

For the 3D7 strain the highest levels of inhibition were observed for the DiCo3 and DiCoMix groups, followed by DiCo1, FVO and DiCo2, respectively. Inhibition levels increased by 11% (95% CI: 10 to 12) for each doubling in total IgG. There were no significant between group differences for the effect of IgG on inhibition. The level of inhibition at 1 mg/mL total IgG was estimated at 46% (95% CI: 37 to 55%). Both DiCo 2 and FVO had significantly lower levels of inhibition at 1 mg/mL total IgG [−17 (95% C −30 to −4) and −17 (95% CI 30 to −3), respectively).

Discussion

The development of an effective Plasmodium falciparum AMA1-based vaccine is difficult because of the polymorphic nature of the vaccine candidate and the immune selection pressure exerted upon it. An effective vaccine is expected to induce responses to conserved determinants as well as to the broadest range of allele-specific determinants. Failure to cover a sufficient amount of the allele-specific determinants found in AMA1 will, most likely, favor the breakthrough of variants and thereby compromise vaccine efficacy.

The AMA1-based vaccines currently in development are based on one or two naturally occurring allelic forms of AMA1 (BPRC, NIH, WRAIR, Pan CPII) and may therefore not cover an amount of variation sufficient to prevent the escape of certain allelic forms.

Coverage can be improved by simply extending the number of alleles included, but cost constraints dictate that only a limited number of alleles can be included. This becomes even more important when several different vaccine candidates are to be combined, as the total numbers and amounts of protein to be included in a malaria vaccine are limited too.

An alternative way to improve coverage is through the use of a limited number of artificial variants designed to optimally cover polymorphisms. From the data presented here we can conclude that diversity covering sequences can be produced, including a sequence that is very close to the consensus. The proteins all react with a conformation-sensitive monoclonal antibody. Moreover, sera elicited by the DiCo sequences all react with naturally occurring alleles and these sequences elicit antibodies that inhibit parasite growth. Indicative that each single DiCo protein is conformationally correct.

The DiCo approach could be considered somewhat naïve, as many epitopes within AMA1 are conformational and may also be discontinuous (crystal structure papers, Thomas refs). However, by incorporating most of the linked residues, stretches of naturally occurring amino acids are present in each single DiCo and therefore a number of conformational epitopes are likely to be preserved. This is corroborated by the observation that all individual DiCo sequences induce growth-inhibiting antibodies to the three laboratory strains tested.

The data presented show that a combination of 3 artificial alleles induces levels of growth inhibiting antibodies to the three strains investigated that are comparable to the levels induced by a homologous antigen (e.g. FVO antigen on FCR3), indicating that a considerable amount diversity is covered by the combination.

Moreover, the DiCo3 antigen yielded high levels (±70%) of growth inhibiting antibody to the 3D7 and HB3 strains, despite being 17 and 18 amino acids different from the 3D7 and HB3 antigens, respectively.

A further aspect of the DiCo proteins is that they can be combined into fusion proteins thereby further reducing the amount of components in a malaria vaccine. This fusion protein can then be further extended by the inclusion of other, preferably small, vaccine candidates. One such candidate is the MSP1₁₉, for which a fusion protein with AMA1 has been produced.

It is possible to make 3 DiCo's that completely incorporate linkage patterns. However, the strategy used here was to assign the residues according to observed prevalence, with the most prevalent going into DiCo1 and the least prevalent into DiCo3. This has led to a number of linkage breaks in DiCo's 2 and 3. 

1-23. (canceled)
 24. An isolated protein composition comprising 2 to 10 protein variants, each protein variant having an amino acid sequence, wherein said protein variants are of a single antigen, wherein said single antigen comprises a plurality of variable amino acid positions, wherein said amino acid sequences of said protein variants, in combination, represent both a frequency of occurrence of each amino acid at said variable amino acid positions and a linkage between said variable amino acid positions, and wherein said frequency of occurrence is at least 10% in said single antigen, and wherein at least 75% of said linkage between said variable amino acid positions are presented by a combination of said protein variants.
 25. The isolated protein composition according to claim 24, wherein said single antigen is an AMA1 protein of a Plasmodium species selected from the group consisting of falciparum, vivax, knowlesi, malariae and ovale.
 26. The isolated protein composition according to claim 25, wherein said variable amino acid positions are selected from the group consisting of 162, 167, 172, 173, 175, 187, 190, 196, 197, 200, 201, 204, 206, 207, 225, 230, 242, 243, 267, 282, 283, 285, 296, 300, 308, 332, 393, 404, 405, 407, 435, 439, 448, 451, 485, 493, 496, 503, 512 and 544 of said AMA1 protein of Plasmodium falciparum.
 27. The isolated protein composition according to claim 24, wherein said protein variants comprise at least three protein variants having the amino acid sequences selected from the group consisting of SEQ ID Nos 1 to
 12. 28. The isolated protein composition according to claim 24, wherein said protein variants comprise at least three protein variants having the amino acid sequences selected from the group consisting of SEQ ID Nos 1 to
 6. 29. The isolated protein composition according to claim 24, wherein said protein variants comprise three protein variants having amino acid sequences of SEQ ID No 1, SEQ ID No 2, and SEQ ID No
 3. 30. The isolated protein composition according to claim 24, wherein said protein variants are linked.
 31. An isolated protein variant comprising an amino acid sequence selected from the group consisting of SEQ ID Nos 1 to
 12. 32. The isolated protein variant according to claim 31, wherein said amino acid sequence is selected from the group consisting of SEQ ID Nos 1 to
 6. 33. The isolated protein variant according to claim 31, wherein said amino acid sequence is selected from the group consisting of SEQ ID Nos 1 to
 3. 34. An isolated nucleic acid comprising a sequence encoding said protein variant according to claim
 30. 35. The isolated nucleic acid according to claim 34, wherein said sequence is selected from the group consisting of SEQ ID Nos 12 to
 24. 36. An expression vector comprising the isolated nucleic acid according to claim
 34. 37. The expression vector according to claim 36, wherein the expression vector is pPicZalpha or pPIC9.
 38. A host organism transformed or transfected with the expression vector according to claim
 36. 39. The host organism according to claim 38, wherein the host organism is Pichia pastoris.
 40. A medicament for vaccinating a vertebrate comprising the isolated protein composition according to claim
 24. 41. The medicament according to claim 40, wherein the vaccination is a vaccination against malaria.
 42. The medicament according to claim 40, wherein the vertebrate is a human.
 43. A method for producing a protein composition, comprising: a) determining a plurality of variable amino acid positions for an antigen; a frequency of occurrence of a variation at each said variable amino acid position; and a linkage between each variable amino acid position; b) determining a maximal variable amino acid position comprising a maximal number X of different amino acids, wherein said frequency of occurrence of said variation is at least 10%; and c) designing X protein variants representing, in combination, said frequency of occurrence of said variation at said variable amino acid position.
 44. The method according to claim 43, wherein said designing of said X protein variants further represents, in combination, at least 75% of said linkage between said variable amino acid position.
 45. The method according to claim 43, wherein step (c) comprises designing protein variants Y₁ to Y_(x) by assigning to a variable amino acid position in protein variant Y₁ a most frequent occurring amino acid at said variable amino acid position in protein variant Y₁, and to Y_(x) a less frequent occurring amino acid at said variable amino acid position in protein variant Y_(x), assigning to a corresponding amino acid position in protein variants Y₂ to Y_(x-1) either a remaining amino acid or a same amino acid depending on said frequency of occurrence of said variation at said remaining amino acid or said same amino acid. 